AITopics | json data

2502.02441

Country: North America > United States > Pennsylvania (0.04)

Genre:

Questionnaire & Opinion Survey (1.00)
Research Report > New Finding (0.46)

Industry:

Leisure & Entertainment (0.68)
Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

arXiv.org Artificial IntelligenceNov-28-2024

DENIAHL: In-Context Features Influence LLM Needle-In-A-Haystack Abilities

Dai, Hui, Pechi, Dan, Yang, Xinyi, Banga, Garvit, Mantri, Raghav

The Needle-in-a-haystack (NIAH) test is a general task used to assess language models' (LMs') abilities to recall particular information from long input context. This framework however does not provide a means of analyzing what factors, beyond context length, contribute to LMs' abilities or inabilities to separate and recall needles from their haystacks. To provide a systematic means of assessing what features contribute to LMs' NIAH capabilities, we developed a synthetic benchmark called DENIAHL (Data-oriented Evaluation of NIAH for LLM's). Our work expands on previous NIAH studies by ablating NIAH features beyond typical context length including data type, size, and patterns. We find stark differences between GPT-3.5 and LLaMA 2-7B's performance on DENIAHL, and drops in recall performance when features like item size are increased, and to some degree when data type is changed from numbers to letters. This has implications for increasingly large context models, demonstrating factors beyond item-number impact NIAH capabilities.

benchmark, gpt-3, key-value pair, (16 more...)

2411.1936

Country:

North America > United States > New York (0.04)
Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.95)

Pawelec, Aneta, Wesołowska, Victoria Sara, Bączek, Zuzanna, Sankowski, Piotr

PUB: Plot Understanding Benchmark and Dataset for Evaluating Large Language Models on Synthetic Visual Data Interpretation

arXiv.org Artificial IntelligenceSep-4-2024

The ability of large language models (LLMs) to interpret visual representations of data is crucial for advancing their application in data analysis and decision-making processes. This paper presents a novel synthetic dataset designed to evaluate the proficiency of LLMs in interpreting various forms of data visualizations, including plots like time series, histograms, violins, boxplots, and clusters. Our dataset is generated using controlled parameters to ensure comprehensive coverage of potential real-world scenarios. We employ multimodal text prompts with questions related to visual data in images to benchmark several state-of-the-art models like ChatGPT or Gemini, assessing their understanding and interpretative accuracy. To ensure data integrity, our benchmark dataset is generated automatically, making it entirely new and free from prior exposure to the models being tested. This strategy allows us to evaluate the models' ability to truly interpret and understand the data, eliminating possibility of pre-learned responses, and allowing for an unbiased evaluation of the models' capabilities. We also introduce quantitative metrics to assess the performance of the models, providing a robust and comprehensive evaluation tool. Benchmarking several state-of-the-art LLMs with this dataset reveals varying degrees of success, highlighting specific strengths and weaknesses in interpreting diverse types of visual data. The results provide valuable insights into the current capabilities of LLMs and identify key areas for improvement. This work establishes a foundational benchmark for future research and development aimed at enhancing the visual interpretative abilities of language models. In the future, improved LLMs with robust visual interpretation skills can significantly aid in automated data analysis, scientific research, educational tools, and business intelligence applications.

dataset, generator, regular augmented regular augmented, (16 more...)

2409.02617

Country:

North America > United States > New York > New York County > New York City (0.04)
Europe > Poland > Łódź Province > Łódź (0.04)
Europe > Poland > Masovia Province > Warsaw (0.04)

Genre: Research Report (1.00)

Industry:

Education > Educational Technology (0.48)
Leisure & Entertainment (0.37)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.92)

Wei, Shuang, Mior, Michael J.

Comprehending Semantic Types in JSON Data with Graph Neural Networks

arXiv.org Artificial IntelligenceJul-24-2023

Semantic types are a more powerful and detailed way of describing data than atomic types such as strings or integers. They establish connections between columns and concepts from the real world, providing more nuanced and fine-grained information that can be useful for tasks such as automated data cleaning, schema matching, and data discovery. Existing deep learning models trained on large text corpora have been successful at performing single-column semantic type prediction for relational data. However, in this work, we propose an extension of the semantic type prediction problem to JSON data, labeling the types based on JSON Paths. Similar to columns in relational data, JSON Path is a query language that enables the navigation of complex JSON data structures by specifying the location and content of the elements. We use a graph neural network to comprehend the structural information within collections of JSON documents. Our model outperforms a state-of-the-art existing model in several cases. These results demonstrate the ability of our model to understand complex JSON data and its potential usage for JSON-related data processing tasks.

json data, semantic type, sherlock, (14 more...)

2307.12807

Country:

Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
North America > United States > New York > Monroe County > Rochester (0.04)
Europe > France > Île-de-France > Paris > Paris (0.04)
(12 more...)

Genre: Research Report > New Finding (0.88)

Industry: Information Technology (0.88)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

#artificialintelligenceFeb-25-2021, 21:14:10 GMT

Setting up Amazon Personalize with AWS Glue

Data can be used in a variety of ways to satisfy the needs of different business units, such as marketing, sales, or product. In this post, we focus on using data to create personalized recommendations to improve end-user engagement. Most ecommerce applications consume a huge amount of customer data that can be used to provide personalized recommendations; however, that data may not be cleaned or in the right format to provide those valuable insights. The goal of this post is to demonstrate how to use AWS Glue to extract, transform, and load your JSON data into a cleaned CSV format. We then show you how to run a recommendation engine powered by Amazon Personalize on your user interaction data to provide a tailored experience for your customers.

amazon personalize, aw glue, recommendation, (14 more...)

Country: North America > United States > California > San Francisco County > San Francisco (0.05)

Genre: Press Release (0.32)

Industry:

Retail > Online (0.41)
Information Technology > Services (0.36)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (0.77)

#artificialintelligenceJun-19-2018, 05:21:00 GMT

Ingesting Apache MXNet Gluon Deep Learning Results Via MQTT and Apache NiFi - DZone AI

We're using a pre-trained model in Apache MXNet Gluon Python 3 code to classify a webcam image captured and processed with OpenCV. In our Python script, we capture the image to disk and capture JSON metadata about the percentage, probabilities, and device information. This JSON data is then sent via MQTT to a broker. Apache NiFi processes the JSON data. A side effect of the process is that it produces a SQL DDL to create a new table for this schema.

artificial intelligence, deep learning, machine learning, (5 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.40)

#artificialintelligenceFeb-23-2017, 18:40:25 GMT

Big data and the risks of using NoSQL databases

NoSQL uses procedural implementation-specific structures expressed in a JSON format to represent its data model. ECMA International Standards body developed JavaScript to handle tasks in the browser. They also provided an extension to JavaScript to develop a lightweight language for interchanging data over the Internet called JavaScript Object Notation (JSON). The downside of JSON is that it lacks the capabilities to provide referential integrity. These data models are neither interoperable nor standardized.

artificial intelligence, data mining, programming language, (18 more...)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Data Science > Data Mining > Big Data (0.87)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.82)
Information Technology > Software > Programming Languages (0.77)

#artificialintelligenceSep-24-2016, 23:55:24 GMT

Building a SEO tool with Machine Learning MonkeyLearn Blog

A few weeks ago, Moz CEO Rand Fishkin approached MonkeyLearn team with a question which later turned into a project. The goal was to build an online tool that provides great value to the SEO industry. Also, we wanted to showcase what can be developed with machine learning technologies by using MonkeyLearn. Basically, SEOs can use this tool to compare their website's keywords to those on the Google search results for a related term. Randy presented this Keyword Comparison Extractor on his keynote at Mozcon 2015, the largest SEO conference out there, with more than 1,500 attendees and speakers from companies like Google, Buffer, Optimizely, Unbounce, Basecamp and others.

artificial intelligence, keyword, machine learning, (14 more...)

Industry: Information Technology (0.37)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.96)
Information Technology > Information Management > Search (0.76)

#artificialintelligenceJul-21-2016, 10:08:19 GMT

Routing Cases by Sentiment with Cognitive Services Text Analytics

After we Save and Activate our process, we are now ready to test it.

artificial intelligence, natural language, text analytic api, (13 more...)

Technology: Information Technology > Artificial Intelligence > Natural Language > Information Extraction (0.60)